Node Architecture and Performance Evaluation of the Hitachi Super Technical Server SR8000
نویسندگان
چکیده
A new architecture for the Hitachi super technical server SR8000 has been developed. The performance of the SR8000 is 8 GFLOPS in each node, which is a RISCbased SMP, and 1 TFLOPS when up to 128 nodes are connected with an interconnect network. A node of the SR8000 provides a new COMPAS (CO-operative MicroProcessors in single Address Space) architecture and improved PVP (Pseudo Vector Processing) architecture. COMPAS provides rapid simultaneous start up of all microprocessors in a node. And PVP provides stable and high data-reference throughput even when the node processes a larger data set than the cache size. These architectures result in the node performance equivalent to that of a vector processor. The new features of COMPAS and PVP are inter-processor communication and multiple outstanding prefetching. We evaluate the node performance of the SR8000, and compare it with that of the Hitachi predecessor vector processor, S-3800. And we demonstrate that the SR8000 has superior node performance than that of the S-3800.
منابع مشابه
Implementation and Evaluation of OpenMP for Hitachi SR8000
This paper describes the implementation and evaluation of the OpenMP compiler designed for the Hitachi SR8000 Super Technical Server. The compiler performs parallelization for the shared memory multiprocessors within a node of SR8000 using the synchronization mechanism of the hardware to perform highspeed parallel execution. To create an optimized code, the compiler can perform optimizations ac...
متن کاملA Parallel 3-D FFT Algorithm on Clusters of Vector SMPs
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes. The three-dimensional FFT algorithm can be altered into a multirow FFT algorithm to expand the innermost loop length. We use the multirow FFT algorithm to implement the parallel three-dimensional FFT algorithm. Performance res...
متن کاملThree-Level Hybrid Parallelization of Large-Scale Data Visualization for the Earth Simulator
High parallel performance is the most distinguished feature of parallel visualization subsystem in GeoFEM. This paper describes some strategies we adopted to improve parallel performance of our subsystem for the Earth simulator. The three-level hybrid parallelization has been applied, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parall...
متن کاملPseudo-Vectorization and RISC Optimization Techniques for the Hitachi SR8000 Architecture
Vector supercomputers have been carrying the performance crown in numerical applications for over two decades. Superior memory bandwidth, data parallelism and pipelining abilities have made these systems the premier choice for top notch research. With the advent of powerful superscalar RISC processors in the mid-80s, a change in programming style was required that could not be easily implemente...
متن کاملAutomatic Performance Tuning for the Multi-section with Multiple Eigenvalues Method for Symmetric Tridiagonal Eigenproblems
We propose multisection for the multiple eigenvalues (MME) method for determining the eigenvalues of symmetric tridiagonal matrices. We also propose a method using runtime optimization, and show how to optimize its performance by dynamically selecting the implementation parameters. Performance results using a Hitachi SR8000 supercomputer with eight processors per node yield (1) up to 6.3x speed...
متن کامل